30 research outputs found

    Bayesian optimization with known experimental and design constraints for chemistry applications

    Full text link
    Optimization strategies driven by machine learning, such as Bayesian optimization, are being explored across experimental sciences as an efficient alternative to traditional design of experiment. When combined with automated laboratory hardware and high-performance computing, these strategies enable next-generation platforms for autonomous experimentation. However, the practical application of these approaches is hampered by a lack of flexible software and algorithms tailored to the unique requirements of chemical research. One such aspect is the pervasive presence of constraints in the experimental conditions when optimizing chemical processes or protocols, and in the chemical space that is accessible when designing functional molecules or materials. Although many of these constraints are known a priori, they can be interdependent, non-linear, and result in non-compact optimization domains. In this work, we extend our experiment planning algorithms Phoenics and Gryffin such that they can handle arbitrary known constraints via an intuitive and flexible interface. We benchmark these extended algorithms on continuous and discrete test functions with a diverse set of constraints, demonstrating their flexibility and robustness. In addition, we illustrate their practical utility in two simulated chemical research scenarios: the optimization of the synthesis of o-xylenyl Buckminsterfullerene adducts under constrained flow conditions, and the design of redox active molecules for flow batteries under synthetic accessibility constraints. The tools developed constitute a simple, yet versatile strategy to enable model-based optimization with known experimental constraints, contributing to its applicability as a core component of autonomous platforms for scientific discovery.Comment: 15 pages, 5 figures (SI with 13 pages, 8 figures

    Roughness of molecular property landscapes and its impact on modellability

    Full text link
    In molecular discovery and drug design, structure-property relationships and activity landscapes are often qualitatively or quantitatively analyzed to guide the navigation of chemical space. The roughness (or smoothness) of these molecular property landscapes is one of their most studied geometric attributes, as it can characterize the presence of activity cliffs, with rougher landscapes generally expected to pose tougher optimization challenges. Here, we introduce a general, quantitative measure for describing the roughness of molecular property landscapes. The proposed roughness index (ROGI) is loosely inspired by the concept of fractal dimension and strongly correlates with the out-of-sample error achieved by machine learning models on numerous regression tasks.Comment: 17 pages, 6 figures, 2 tables (SI with 17 pages, 16 figures

    On scientific understanding with artificial intelligence

    Get PDF
    Imagine an oracle that correctly predicts the outcome of every particle physics experiment, the products of every chemical reaction, or the function of every protein. Such an oracle would revolutionize science and technology as we know them. However, as scientists, we would not be satisfied with the oracle itself. We want more. We want to comprehend how the oracle conceived these predictions. This feat, denoted as scientific understanding, has frequently been recognized as the essential aim of science. Now, the ever-growing power of computers and artificial intelligence poses one ultimate question: How can advanced artificial systems contribute to scientific understanding or achieve it autonomously? We are convinced that this is not a mere technical question but lies at the core of science. Therefore, here we set out to answer where we are and where we can go from here. We first seek advice from the philosophy of science to understand scientific understanding. Then we review the current state of the art, both from literature and by collecting dozens of anecdotes from scientists about how they acquired new conceptual understanding with the help of computers. Those combined insights help us to define three dimensions of android-assisted scientific understanding: The android as a I) computational microscope, II) resource of inspiration and the ultimate, not yet existent III) agent of understanding. For each dimension, we explain new avenues to push beyond the status quo and unleash the full power of artificial intelligence's contribution to the central aim of science. We hope our perspective inspires and focuses research towards androids that get new scientific understanding and ultimately bring us closer to true artificial scientists.Comment: 13 pages, 3 figures, comments welcome

    Free energy calculations in drug design: application to bromodomains

    No full text
    Computer simulations of biomolecules have been improving at a pace that is faster than Moore’s law for microprocessors in the last few decades. Thanks to advances in theory, hardware, and algorithms it is increasingly possible to study biological processes at relevant spatial and temporal resolutions, and to exploit simulation for quantitative predictions. One area that can potentially benefit greatly from such computational predictions is that of drug discovery. Since the inception of the concept of rational drug design, the prediction of how tightly an organic molecule binds to a macromolecular partner has been one of the chief objectives of computational chemistry. Computers already play a fundamental support role during the drug discovery process, and today many novel approaches that aim at studying the details of drug binding and predicting binding affinity are being actively investigated. In this thesis, I report a series of studies that aim to evaluate the potential utility of free energy calculations based on molecular simulations for drug design. In particular, I focus on the prediction of small-molecule binding affinities to the epigenetic target of bromodomains. Bromodomains are small protein modules that have been found in 46 human proteins involved in gene regulation. Given their role in various diseases, in particular cancer and inflammation, a number of bromodomain inhibitors are currently being investigated both in the laboratory and the clinic. Here, it is shown how thorough calculations based on explicit-solvent simulations and all-atom force fields can accurately reproduce binding free energies for this protein family. Rigorous free energy calculations are also compared to more approximate methods based on the post-processing of the simulation trajectories in implicit solvent. Finally, a recently proposed method for the estimation of water binding free energy is employed to study water displaceability from bromodomain binding pockets.</p

    Free energy calculations in drug design: application to bromodomains

    No full text
    Computer simulations of biomolecules have been improving at a pace that is faster than Mooreâs law for microprocessors in the last few decades. Thanks to advances in theory, hardware, and algorithms it is increasingly possible to study biological processes at relevant spatial and temporal resolutions, and to exploit simulation for quantitative predictions. One area that can potentially benefit greatly from such computational predictions is that of drug discovery. Since the inception of the concept of rational drug design, the prediction of how tightly an organic molecule binds to a macromolecular partner has been one of the chief objectives of computational chemistry. Computers already play a fundamental support role during the drug discovery process, and today many novel approaches that aim at studying the details of drug binding and predicting binding affinity are being actively investigated. In this thesis, I report a series of studies that aim to evaluate the potential utility of free energy calculations based on molecular simulations for drug design. In particular, I focus on the prediction of small-molecule binding affinities to the epigenetic target of bromodomains. Bromodomains are small protein modules that have been found in 46 human proteins involved in gene regulation. Given their role in various diseases, in particular cancer and inflammation, a number of bromodomain inhibitors are currently being investigated both in the laboratory and the clinic. Here, it is shown how thorough calculations based on explicit-solvent simulations and all-atom force fields can accurately reproduce binding free energies for this protein family. Rigorous free energy calculations are also compared to more approximate methods based on the post-processing of the simulation trajectories in implicit solvent. Finally, a recently proposed method for the estimation of water binding free energy is employed to study water displaceability from bromodomain binding pockets.</p

    A graph representation of molecular ensembles for polymer property prediction

    No full text
    A graph representation that captures critical features of polymeric materials and an associated graph neural network achieve superior accuracy to off-the-shelf cheminformatics methodologies.</jats:p

    Anubis: Bayesian optimization with unknown feasibility constraints for scientific experimentation

    No full text
    Model-based optimization strategies, such as Bayesian optimization (BO), have been deployed across the natural sciences in design and discovery campaigns due to their sample efficiency and flexibility. The combination of such strategies with automated laboratory equipment and/or high-performance computing in a suggest-make-measure closed-loop constitutes a self-driving laboratory (SDL), which have been endorsed as a next-generation technology for autonomous scientific experimentation. Despite the promise of early SDL prototypes, a lack of flexible experiment planning algorithms prevents certain prevalent optimization problem types from being addressed. For instance, many experiment planning algorithms are unable to intelligently deal with failed measurements resulting from a priori unknown constraints on the parameter space. Such constraint functions are pervasive in chemistry and materials science research, stemming from unexpected equipment failures, failed/abandoned syntheses, or unstable molecules or materials. In this work, we provide a comprehensive discussion and benchmark of BO strategies to deal with a priori unknown constraints, characterized by learning the constraint function on-the-fly using a variational Gaussian process classifier and combining its predictions with the typical BO regression surrogate to parameterize feasibility-aware acquisition functions. These acquisition functions balance sampling parameter space regions deemed to be promising in terms of optimization objectives with avoidance of regions predicted to be infeasible. In addition to benchmarking feasibility-aware acquisition functions on analytic optimization benchmark surfaces, we conduct two realistic optimization benchmarks derived from previously reported studies: inverse design of hybrid organic-inorganic halide perovskite materials with unknown stability constraints, and the design of BCR-Abl kinase inhibitors with unknown synthetic accessibility constraints. We deliver intuitive recommendations to readers on which strategies work best for various scenarios. Overall, this work contributes to advancing the practicality and efficiency of autonomous experimentation in SDLs. All strategies introduced in this work are implemented as part of the open-source Atlas Python library
    corecore